{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we begin, we will change a few settings to make the notebook look a bit prettier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "%%html\n",
    "<style> body {font-family: \"Calibri\", cursive, sans-serif;} </style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "# 03 - Parameter Optimization\n",
    "So far, we have used parameters that have been previously reported (namely\n",
    "in the original paper). However, more likely than not, you will be using\n",
    "data of your own, which will require tuning the model's (hyper)parameters.\n",
    "\n",
    "As we all know, hyperparameter tuning can be almost an art of itself. \n",
    "However, fortunately TensorFlow 2.0 has a hyperparameter tuner in\n",
    "form of [Keras Tuner](https://keras-team.github.io/keras-tuner/),\n",
    "In this notebook, we will see how this works in DeepSurvK.\n",
    "\n",
    "This notebook assumes that you have gone through the [basics of DeepSurv](./00_understanding_deepsurv.ipynb)\n",
    "as well as [DeepSurvK's basic usage](./01_deepsurvk_quickstart.ipynb)\n",
    "\n",
    "## Preliminaries\n",
    "\n",
    "Import packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pathlib\n",
    "import numpy as np\n",
    "from sklearn.compose import ColumnTransformer\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "import deepsurvk\n",
    "from deepsurvk.datasets import load_rgbsg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define paths."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "PATH_MODELS = pathlib.Path(f'./models/')\n",
    "\n",
    "# If models directory does not exist, create it.\n",
    "if not PATH_MODELS.exists():\n",
    "    PATH_MODELS.mkdir(parents=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get data\n",
    "We will use the RGBSG dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, Y_train, E_train = load_rgbsg(partition='training')\n",
    "X_test, Y_test, E_test = load_rgbsg(partition='testing')\n",
    "\n",
    "# Calculate important parameters.\n",
    "n_patients_train = X_train.shape[0]\n",
    "n_features = X_train.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-process data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "# Standardization\n",
    "cols_standardize = ['grade', 'age', 'n_positive_nodes', 'progesterone', 'estrogen']\n",
    "X_ct = ColumnTransformer([('standardizer', StandardScaler(), cols_standardize)])\n",
    "X_ct.fit(X_train[cols_standardize])\n",
    "\n",
    "X_train[cols_standardize] = X_ct.transform(X_train[cols_standardize])\n",
    "X_test[cols_standardize] = X_ct.transform(X_test[cols_standardize])\n",
    "\n",
    "Y_scaler = StandardScaler().fit(Y_train)\n",
    "Y_train['T'] = Y_scaler.transform(Y_train)\n",
    "Y_test['T'] = Y_scaler.transform(Y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (Hyper)parameter optimization\n",
    "\n",
    "We will define the parameters that we wish to explore as a dictionary.\n",
    "Notice that all parameters must be given in a list, even if they consist\n",
    "of a single value.\n",
    "\n",
    "We can define the number of epochs (`epochs`) in that same dictionary as well.\n",
    "Technically, we won't optimize this parameter. However, we might\n",
    "want to set it to a given value. In this case, provide only a single value.\n",
    "If more are given, only the first will be considered. If it isn't \n",
    "defined, a default value of 1000 is used.\n",
    "\n",
    "For this example, we will fix most of the reported parameters \n",
    "for the RGBSG dataset and optimize only three of them: \n",
    "- `n_layers` - 1, 16\n",
    "- `n_nodes` - 2, 8\n",
    "- `activation` - `relu`, `selu`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "params = {'epochs':[999],\n",
    "          'n_layers':[1, 16],\n",
    "          'n_nodes':[2, 8], \n",
    "          'activation':['relu', 'selu'],\n",
    "          'learning_rate':[0.154],\n",
    "          'decay':[5.667e-3],\n",
    "          'momentum':[0.887],\n",
    "          'l2_reg':[6.551],\n",
    "          'dropout':[0.661],\n",
    "          'optimizer':['nadam']}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "This will results in testing a very small number of \n",
    "possible combinations (8). Using a grid search with default values, the \n",
    "optimization will use 3 folds (as reported in the original paper) and \n",
    "5 repetitions. So for each parameter combination, a model will be fitted and \n",
    "evaluated 15 times. In total, this will result in 120 fits. Notice how this \n",
    "can result in an exponential increase of computational time required \n",
    "depending on the number of parameters to be optimized. \n",
    "Be careful with this, especially if you are using a large dataset!\n",
    "\n",
    "Then, we will obtain the best parameters using DeepSurvK's `optimize_hp`.\n",
    "\n",
    "> *Why not use an existing (hyper)parameter optimization tool?*\n",
    "> \n",
    "> (Hyper)parameter optimization is a well known issue of deep learning\n",
    "> models. There are a few tools out there that are designed for this\n",
    "> purpose, such as [Talos](https://github.com/autonomio/talos),\n",
    "> using a [Scikit wrapper](https://www.tensorflow.org/api_docs/python/tf/keras/wrappers/scikit_learn)\n",
    "> and even Keras's own [Keras Tuner](https://github.com/keras-team/keras-tuner).\n",
    "> Unfortunately, I couldn't manage to get any of these tools working for\n",
    "> DeepSurvK. This was mainly because DeepSurvK depends not only on `X` and \n",
    "> `Y`, but also `E` (since it is a survival task). This is a problem\n",
    "> since none of these options support this extra parameter.\n",
    "> Maybe I'm wrong and this could be fixed in a future version.\n",
    "> Contributions are always welcome.\n",
    "\n",
    "In theory, we should get the reported values for these parameters \n",
    "(`n_layers = 1`, `n_nodes = 8`, `activation = selu`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_params = deepsurvk.optimize_hp(X_train, Y_train, E_train, \n",
    "                                    mode='grid', \n",
    "                                    n_splits=3, \n",
    "                                    n_repeats=5, \n",
    "                                    verbose=True, \n",
    "                                    **params)\n",
    "\n",
    "print(best_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Alternatively, DeepSurvK also comes with a randomized hyperparameter\n",
    "> search. For this, call `optimize_hp` with parameter `mode = 'random'`.\n",
    "> Additionally, the number of iterations can be defined with the\n",
    "> parameter `n_iter`. If no value is given, it defaults to 25\n",
    "> (which is actually quite small).\n",
    "> \n",
    "> When using randomized search:\n",
    "> * For numerical variables: provide the low and high boundary of the\n",
    "> parameter space. Additional values will be ignored.\n",
    "> * For categorical variables: provide the potential values that you\n",
    "> would like to try (just like in a grid search).\n",
    ">\n",
    "> In both cases, providing a single value will guarantee that is always\n",
    "> used (i.e., it will be fixed).\n",
    "\n",
    "This looks good. Now, as usual, we can just create a new model with the\n",
    "optimized parameters, fit it, and generate predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dsk = deepsurvk.DeepSurvK(n_features=n_features, E=E_train, **best_params)\n",
    "loss = deepsurvk.negative_log_likelihood(E_train)\n",
    "dsk.compile(loss=loss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "callbacks = deepsurvk.common_callbacks()\n",
    "epochs = 1000\n",
    "history = dsk.fit(X_train, Y_train, \n",
    "                  batch_size=n_patients_train,\n",
    "                  epochs=epochs, \n",
    "                  callbacks=callbacks,\n",
    "                  shuffle=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "deepsurvk.plot_loss(history)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_pred_test = np.exp(-dsk.predict(X_test))\n",
    "c_index_test = deepsurvk.concordance_index(Y_test, Y_pred_test, E_test)\n",
    "print(f\"c-index of testing dataset = {c_index_test}\")"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,py:percent"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
